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Abstract: Social media could provide valuable information to support decision making in crisis management, such as in 

accidents, explosions and fires. However, much of the data from social media are images, which are uploaded 
in a rate that makes it impossible for human beings to analyze them. Despite the many works on image 
analysis, there are no fire detection studies on social media. To fill this gap, we propose the use and evaluation 
of a broad set of content-based image retrieval and classification techniques for fire detection. Our main 
contributions are: (/) the development of the East-Eire Detection method {FFireDt), which combines feature 
extractor and evaluation functions to support instance-based learning; (//) the construction of an annotated set 
of images with ground-truth depicting fire occurrences - the Flickr-Fire dataset; and {Hi) the evaluation of 36 
efficient image descriptors for fire detection. Using real data from Flickr, our results showed that FFireDt 
was able to achieve a precision for fire detection comparable to that of human annotators. Therefore, our work 
shall provide a solid basis for further developments on monitoring images from social media. 


1 INTRODUCTION 


Disasters in industrial plants, densely populated areas, 
or even crowded events may impact property, environ¬ 
ment, and human life. For this reason, a fast response 
is essential to prevent or reduce injuries and financial 
losses, when crises situations strike. The management 
of such situations is a challenge that requires fast and 
effective decisions based on the best data available, 
because decisions based on incorrect or lack of infor¬ 
mation may cause more damage ( [Russo, 2013 1. One 
of the alternatives to improve information correctness 
and availability for decision making during crises is 
the use of software systems to support experts and res¬ 
cue forces ( jKudyba, 2Q14| ). 


Systems aimed at supporting salvage and rescue 
teams often rely on images to understand the crisis 
scenario and to design the actions that will reduce 
losses. Crowdsourcing and social media, as massive 
sources of images, possess a great potential to assist 
such systems. Web sites such as Flickr, Twitter, and 
Facebook allow users to upload pictures from mobile 
devices, what generates a flow of images that carries 
valuable information. Such information may reduce 
the time spent to make decisions and it can be used 


along with other information sources. Automatic im¬ 
age analysis is important to understand the dimen¬ 
sions, type, and the objects and people involved in an 
incident. 

Despite the potential benefits, we observed that 
there is still a lack of studies concerning automatic 
content-based processing of crisis images ( jVillela 
et al., 2QT^ . In the specific case of Are - which 
is observed during explosions, car accidents, forest 
and building Are, to name a few, there is an ab¬ 
sence of studies to identify the most adequate content- 
based retrieval techniques (image descriptors) able to 
identify and retrieve relevant images captured during 
crises. In this work, we All one of the gaps providing 
an architecture and an evaluation of the techniques for 
Are monitoring in images collected from social media. 

This work reports on one of the steps of the 
project Reliable and Smart Crowdsourcing Solution 
for Emergency and Crisis Management - 
The project goal is to use crowd-sourcing data (image, 
video, and text captured with mobile devices) to assist 
in rescue missions. This paper describes the evalua¬ 
tion of techniques to detect Are in image data, one of 


^ http ://ww w.rescuer-proj ect.org/ 
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3 BACKGROUND 


the project targets. We use real images from Flickij^ 
a well-known social media website from where we 
collected a large set of images that were manually an¬ 
notated as having fire or not fire. We used this dataset 
as a ground-truth to evaluate image descriptors in the 
task of detecting fire. Our main contributions are the 
following: 

1. Curation of the Flickr-Fire Dataset: a vast 
human-annotated dataset of real images suitable 
as ground-truth for the development of content- 
based techniques for fire detection; 

2. Development of FFireDt: we propose the Fast- 
Fire Detection and Retrieval {FFireDt), a scal¬ 
able and accurate architecture for automatic fire- 
detection, designed over image descriptors able to 
detect fire, which achieves a precision comparable 
to that of human annotation; 

3. Evaluation: we soundly compare the precision 
and performance of several image descriptors for 
image classification and retrieval. 

Our results provide a solid basis to choose the 
most adequate pair feature extractor and evaluation 
function in the task of fire detection, as well as a com¬ 
prehensive discussion of how the many existing alter¬ 
natives work in such task. 

The remaining of this paper is structured as fol¬ 
lows: Section [^presents the related work; Section 
presents the main concepts regarding fire-detection in 
images; Section presents the methodology. Sec¬ 
tion describes the experiments and discusses their 
results; finally. Section [^presents the conclusions. 


2 RELATED WORK 


on Gaussian differences ( |Celik et al., 2007| ) or in spec¬ 
tral characteristics to identify fire, smoke, heat or ra¬ 
diation. The spectral color model has been used along 
with spatial correlation and a stochastic model to cap¬ 
ture fire motion ( Liu and Ahuja, 2004| ). However, 
such technique requires a set of images and is not suit¬ 
able for individual images, as it is frequent with social 
media. 

Other studies employ a variation of the combina¬ 
tion given by a color model transform plus a classifier. 
This combination is employed in the work of Dim- 
itropoulos ( [Dimitropoulos et al., 2014] ), which repre¬ 
sents each frame according to the most prominent tex¬ 
ture and shape features. It also combines such repre¬ 
sentation with spatio-temporal motion features to em¬ 
ploy SVM to detect fire in videos. However, this ap¬ 
proach is neither scalable nor suitable for fire detec¬ 
tion on still images. 

On the other hand, the feature extraction meth¬ 
ods available in the MPEG-7 Standard have been used 
for image representation in fast-response systems that 


deal with large amounts of data ( Doeller and Kosch, 
|Ojala et al., 2002} |Tjondronegoro and Chen, 


2008 


2002). However, to the best of our knowledge, there is 


no study employing those extractors for fire detection. 
Moreover, despite these multiple approaches, there is 
no conclusive work about which image descriptors are 
suitable to identify fire in images. 


3 BACKGROUND 

3.1 Content-based model for retrieval 
and classification 


Previous efforts on mining information from 
sets of images include detecting social events and 
tracking the corresponding related topics which can 
even include the identification of touristic attractions 
( |Tamura et al., 201^ . 

Distinctly, in this paper we are interested in the 
following problem: Given a collection of photos, pos¬ 
sibly obtained by a social media service, how can we 
efficiently detect fire? Interesting approaches related 
to fire motion analysis on video are not applicable for 
static images ( [Chunyu et al., 2010] ), and most of these 
approaches where found not to work with satisfactory 
performance ( Celik et al., 2007[ |Ko et al., 2009} |Liu 


land Ahuja, 2QQ4| ) 

Some of the previous works propose the construc¬ 
tion of a particular color model focused on fire, based 


^ http s://WWW. flickr. com/ 


The most usual approach to recover and classify im¬ 
ages by content relies on representing them using fea¬ 
ture extractor techniques ( Guyon et al., 2006 ). Af¬ 
ter extraction, the images can be retrieved compar¬ 
ing their feature vectors using an evaluation function, 
which is usually a metric or a divergence function. 
Comparison is a required step in image retrieval sys¬ 
tems. Also, Instance-Based Learning (IBL) classifiers 
use the evaluation among the images of a given set to 
label them regarding their visual content ( |Bedo et al^ 


2014HAhaet al., \99^ . Such concepts can be formal¬ 


ized by the following definitions. 


Definition 1. Feature Extraction Method (FEM): 

A feature extraction method is a non-bijective func¬ 
tion that, given an image domain I, is able to repre¬ 
sent any image iq ^lina domain F as fq. Each value 
fq is called a feature vector (FV) and represents char¬ 
acteristics of an image iq. 
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3 BACKGROUND 


In this paper, we use FEMs to represent images 
in multidimensional domains. Therefore, the image 
feature vectors can be compared according to the next 
definition: 

Definition 2. Evaluation Function (EF): Given the 
feature vectors f, fj and G F, an evaluation func¬ 
tion 5 : F X F ^ M w able to compare any two ele¬ 
ments from F. The EF is said to be a metric distance 
function if it complies with the following properties: 

• Symmetry: 

• Non-negativity: 0 < 5{fi^fj) < oo, 

• Triangular inequality: d{fi^fj) < 8(/i,A) + 

The FEM defines the element distribution in the 
multidimensional space. On the other hand, the eval¬ 
uation function defines the behavior of the searching 
functionalities. Therefore, the combination of FEM 
and EF is the main parameter to improve or decrease 
accuracy and quality for both classification and re¬ 
trieval. Formally, this association can be defined as: 

Definition 3. Image Descriptor (ID): An image de¬ 
scriptor is a pair <8, 5 >, where 8 is a (composition 
of) FEM and 5 is a (weighted) EF. 

By employing a suitable image descriptor, it is 
possible to inspect the neighborhood of a given ele¬ 
ment considering previous labeled cases. This course 
of action is the principle of the Instance-Based Learn¬ 
ing algorithms, which rely on previously labeled data 
to classify new elements according to their nearest 
neighbors. The sense of what “nearest” means is pro¬ 
vided by the EF. Formally, this operation must respect 
the definition bellow: 

Definition 4. k Nearest-Neighbors - kNN: Given an 
image iq represented as fq G F, an image descriptor 
ID =< 8, 5 >, a number of neighbors ^ G N and a 
set E of images, the k-Nearest Neighbors set is the 
subset 6>/ F C F such that kNN = {/„ G F | V G 

S(/„, fq) < d{fi, fq)}. 

Once the kNN performance relies on the capabil¬ 
ity of the image descriptor to define the image repre¬ 
sentations and the search space, it becomes the critical 
point to be defined in an image retrieval system. Fol¬ 
lowing we review, experiment, and report on multiple 
possibilities of image descriptors for fire detection. 

3.2 MPEG-7 Feature Extraction 
Methods 

The MPEG-7 standard was proposed by the ISO/IEC 
JTCl ( [IEEE MultiMedia, 2002| ). It defines expected 
representations for images regarding color, texture 
and shape. The set of proposed feature extraction 


methods were designed to process the original image 
as fast as possible, without taking into account spe¬ 
cific image domains. The original proposal of MPEG- 
7 is composed of two parts: high and low-level val¬ 
ues, both intended to represent the image. The low- 
level value is the representation of the original data 
by a EEM. On the other hand, the high-level feature 
requires examination by an expert. 

The goal of MPEG-7 is to standardize the repre¬ 
sentation of streamed or stored images. The low-level 
EEMs are widely employed to compare and to filter 
data, based purely on content. These EEMs are mean¬ 
ingful in the context of various applications according 
to several studies ( Doeller and Kosch, 2008 [ |Tjon-| 
dronegoro and Chen, 2002| ). They are also supposed 
to define objects by including color patches, shapes or 
textures. The MPEG-7 standard defines the following 
set of low-level extractors ( |Sato et al., 2Q10| ): 


• Color: Color Layout, Color Structure, Scalable 
Color and Color Temperature, Dominant Color, 
Color Correlogram, Group-of-Erames; 

• Texture: Edge Histogram, Texture Browsing, 
Homogeneous Texture; 

• Shape: Contour Shape, Shape Spectrum, Region 
Shape; 


We highlight that shape EEMs’ usually depends of 
previous object definitions. As the goal of this study 
relies on defining the best setting for automatic clas¬ 
sification and retrieval for fire detection, user interac¬ 
tion on the extraction process is not suitable for our 
proposal. Thus, we focus only on the color and tex¬ 
ture extractors. In this study we employ the following 
MPEG-7 extractors: Color Layout, Color Structure, 
Scalable Color, Color Temperature, Edge Histogram, 
and Texture Browsing. They are explained in the next 
Sections. 


3.2.1 Color Layout 


The MPEG-7 Color Layout (CL) 

|mada, 2dOT] ) describes the image 
considering spatial location. It splits the image in 
squared sub-regions (the number of sub regions is a 
parameter) and label each square with the average 
color of the region. Eigure [^b) depicts the regions 
for Eigure [Ja) according to the Color Layout ex¬ 
tractor. Next, the average colors are transformed to 
the YCbCr space and a Discrete Cosine Transforma¬ 
tion is applied over each band of the YCbCr region 
values. The low-frequency coefficients are extracted 
through a zig-zag image reading. In order to reduce 
dimensionality, only the most prominent frequencies 
are employed in the feature vector. 


( Kasutani and Ya- 
color distribution 
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3 BACKGROUND 


3.2 MPEG-7 Feature Extraction Methods 



3.2.2 Scalable Color 

The MPEG-7 Scalable Color (SC) ( [Manjunath et al., 
200 1| ) aims at capturing the prominent color distribu¬ 
tion. It is based on four stages. The first stage converts 
all pixels from the RGB color-space to the HSV space 
and a normalized color histogram is constructed. The 
color histogram is quantized using 256 levels of the 
HSV space. Finally, a Haar wavelet transformation 
is applied over the resulting histogram (|Ojala et al., 
[200^ . 

3.2.3 Color Structure 


The MPEG-7 Color Structure (CS) expresses both 
spatial and color distribution ( [Sikora, 200 1| ). This pa¬ 
per splits the original image in a set of color structures 
with fixed-size windows. Each fixed-size window se¬ 
lects equally spaced pixels to represent the local color 
structure, as depicted in Figure |^a). 
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(a) (b) 


Figure 2: (a) Color structure with a defined window (b) Lo¬ 
cal histogram. 

The window size and the number of local struc¬ 
tures are parameters of CS ( [Manjunath et al., 2001 1. 
For each color structure, a quantization based on the 
HMMD - a color-space derived from HSV that rep¬ 
resents color differences - is executed. Then a local 
“histogram” based on HMMD is built. It stores the 
presence or absence of the quantized color instead of 
its distribution along with the window (Figure [^b)). 
The resulting feature vector is the accumulated distri¬ 
bution of the local histograms according to the previ¬ 


ous quantization. 

3.2.4 Edge Histogram 

The MPEG-7 Edge Histogram (EH) aims at captur¬ 
ing local and global edges. It defines five types of 
edges (Figure regarding N xN blocks, where N is 
a extractor parameter. Each block is constructed by 
partitioning the original image into squared regions. 


r 


(a) (b) (c) (d) (e) 

Figure 3: Edge types: (a) Vertical (b) Horizontal (c) 45 de¬ 
gree (d) 135 degree (e) non-directional. 

After applying the masks shown in Figure to an 
image, it is possible to compute the local edge his¬ 
tograms. At this stage, the entire histogram is com¬ 
posed of 5 X V bins, but it is biased by local edges. 
To circumvent this problem, a variation ( [Park et aL^ 
|2000| ) was proposed to capture also semi-local edges. 
Figure 1^ illustrates how the 13 semi-local edges are 
calculated. The horizontal semi-local edges are eval¬ 
uated first, then the vertical ones and finally the five 
combinations of the super block edges. 
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Figure 4: 13 regions corresponding to semi-local edge his¬ 
tograms. 


The resulting feature vector is composed of N plus 
thirteen edge-histograms, which represents the local 
and the semi-local distribution, respectively. 

3.2.5 Color Temperature 


The main hypothesis supporting the MPEG-7 Color 
Temperature (CT) is that there is a correlation be¬ 
tween the “feeling of image temperature” and illumi¬ 
nation properties. Formally, the proposal considers 
a theoretical object called black body, whereupon its 
color depends on the temperature ( jWnukowicz and 
Skarbek, 2003 j ). Figure depicts the locus of the 
theoretical black body, according to Planck formula 
changing from 2000 Kelvin (red) to 25000 Kelvin 
(blue). 

The feature vector represent the linearized pixels 
in the XYZ space. This is performed by interactively 
discarding every pixel with luminance Y above the 
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3.3 Evaluation Functions 


3 BACKGROUND 



Figure 5: CIE color system and black body locus indicated 
by the in-out points. 


given threshold - a FEM’s parameter. Thereafter, the 
average color coordinate in XYZ is converted to UCS. 
Finally, the two closest isotemperature lines is calcu¬ 
lated from the given color diagrams (|Wnukowicz and 
Skarbek, 2003 1 ). The formula for the resulting color 
temperature depends on the average point, the closest 
isolines and the distances among them. 


3.2.6 Texture Browsing 


The MPEG-7 Texture Browsing extractor (TB) is ob¬ 


tained from Gabor filters applied to the image (Fee 


and Chen, 2005| ). This FEM parameters’ are the same 


used in Gabor filtering. Figure |^(b) ilustrates the re¬ 
sult of using the Gabor filter to process image (a) 
following a particular setting. The Texture Browsing 
feature vector is composed of 12 positions: 2 to rep¬ 
resent regularity, 6 for directionality and 4 for coarse¬ 
ness. 



(a) (b) 

Figure 6: (a) Original Image (b) Gabor filter with a kernel 
setting. 


3.3 Evaluation Functions 

An Evaluation Function expresses the proximity be¬ 
tween two feature vectors. We are interested in fea¬ 
ture extractor that generates the same amount of fea¬ 
tures for each image, thus in this paper we account 
only for evaluation functions for multidimensional 
spaces. Particularly, we employed distance functions 
(metrics) and divergences as evaluation functions. 
Suppose two feature vectors X = {vi, V 2 , ..., 
and T = {yi, y2, of dimensionality n. Ta- 

ble[^ shows the EFs implemented, according to their 
evaluation formulas. 

Table 1: Evaluation functions: their classification as metric 
distance functions and respective formulas. 


Name 

Metric 

Formula 

City-Block 

Yes 

I?=i \^i-yi\ 

Euclidean 

Yes 


Chebyshev 

Yes 


Canberra 

Yes 

ir=i- 

xi-yi 

xi\V\yi 


Fullback 

Leibler 

Divergence 

No 


Jeffrey 

Divergence 

No 



The most widely employed metric distance func¬ 
tions are those related to the Minkowski family: the 
Manhattan, Euclidean and Chebyshev ( [Zezula et al., 
|2006| ). A variation of the Manhattan distance is the 
Canberra distance that results in distances in the range 
[0,1]. These four EFs satisfy the properties of Defini¬ 
tion]^ Therefore, they are metric distance functions. 

However, there are non-metric distance functions 
that are useful for image classification and retrieval. 
The Kullback-Feibler Divergence, for instance, does 
not follow the triangular inequality neither the sym¬ 
metry properties. A symmetric variation of Kullback- 
Feibler distance is the Jeffrey Divergence, yet it still is 
not a metric due to the lack of the triangular inequality 
compliance. 


The regularity features represent the degree of reg¬ 
ularity of the texture structure as a more/less regular 
pattern, in such a way that the more regular a texture, 
the more robust the representation of the other fea¬ 
tures is. The directionality defines the most dominant 
texture orientation. This feature is obtained providing 
an orientation variation for the Gabor filters. Finally, 
the coarseness represents the two dominant scales of 
the texture. 


3.4 Instance-Based Learning - IBL 

The main hypothesis for IBF classification is that the 
unlabeled feature vectors (FV) pertain to the same 
class of its k Nearest-Neighbors, according to a pre¬ 
defined rule. Such classifier relies on three resources: 

1. An evaluation function, which evaluates the prox¬ 
imity between two FVs; 
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4 PROPOSED METHOD 


4.2 The Architecture of East-Eire Detection 


2. A classification function, which receives the near¬ 
est FVs to classify the unlabeled one - commonly 
considering the majority of retrieved FVs; 

3. A concept description updater, which maintains 
the record of previous classifications. 

Variation of these parts defines different IBL ver¬ 
sions. For instance, the IBl - probably the most 
widely adopted IBL algorithm - adopts the majority 
of the retrieved elements as the classification rule and 
keeps no record of previous classifications. 

The kNN process is able to solve all steps re¬ 
quired by IBl. Moreover, it can be seamlessly 
integrated to the concept of similarity queries by 
employing extended-SQL expressions ( |Bedo et al.,| 
|2014| ). This database-driven approach to solve IBl 
may reduce the time to obtain the final classification 
by orders of magnitude, besides the obvious gains 
obtained by structuring queried data following the 
entity-relationship model. 


4 PROPOSED METHOD 
4.1 Dataset Flickr-Fire 



Figure 7: Sample images labeled as ’fire’ from dataset 

Flickr-Fire. 



Figure 8: Sample images labeled as ’not-fire’ from dataset 

Flickr-Fire. 


4.2 The Architecture of Fast-Fire 
Detection 


We used the Flickr APj^to download 5,962 images 
(no duplicates) under the Creative Commons license. 
The images were retrieved using textual queries such 
as: “fire car accident”, “criminal fire”, and “house 
burning”. Figures |7]and[^illustrate samples of the ob¬ 
tained images, which we named Flickr-Fire. Even 
with queries related to fire, some of the images did 
not contain visual traces of fire, so each image was 
manually annotated to define a coherent ground-truth 
dataset. 

To perform the annotation, we asked 7 subjects, 
all of them aging between 20 and 30 years, familiar 
with the issue, and non-color-blinded. To each subject 
it was given a subset with 1,589 images that he/she 
should annotate as containing or not traces of fire. For 
images in which the annotations disagreed, we asked 
a third subject to provide an annotation. The average 
disagreement was 7.2%. 

In order to balance the class distribution of the 
dataset, we randomly removed images to have 1,000 
images containing fire and 1,000 ima^s without fire. 
We made the dataset available onlin^ aiming at the 
reproducibility of our experiments. 


^The Flickr API is available at: 
WWW. flickr.com/services/api/ 

^The Flickr-Fire dataset at: www. gbdi. icmc .usp.br 


Table 2: Feature Extractor Method acronyms used in the 
experiments. 


Feature Extractor Method 

Acronym 

Color Layout 

CL 

Scalable Color 

SC 

Color Structure 

CS 

Color Temperature 

CT 

Edge Histogram 

EH 

Texture Browsing 

TB 


Table 3: Evaluation Eunction acronyms used in the experi¬ 
ments. 


Evaluation Function Name 

Acronym 

City-Block 

CB 

Euclidean 

EU 

Chebyshev 

CH 

Canberra 

CA 

Kullback Leibler 

Divergence 

KU 

Jeffrey Divergence 

JF 


Here we introduce the Fast-Fire Detection 
(FFireDt) architecture, which uses image descriptors 
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4.2 The Architecture of Fast-Fire Detection 


4 PROPOSED METHOD 


Image stream 



Image i 
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DF 
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MW Retrieval 


IBL- Classification Module 


{fire} 
{not fire} 


Image Retrieval Module 



Image #01 


Figure 9: Architecture of the FFireDt. The Evaluating Module receives an unlabeled image, represents it executing feature 
extractor methods and labels it by using the Instance Based Learning Module. The system output (image plus label) interacts 
with the experts, who may also perform a similarity query. 


for image retrieval and classification. The architec¬ 
ture is organized in modules that implement the con¬ 
cepts reviewed in Section Figure illustrates the 
relationship among the modules, their communication 
and how they relate to a relational database manage¬ 
ment system (RDBMS). 

The feature extraction methods module (the FEM 
module) accepts any kind of feature extractor. For 
this work, we implemented six extractors following 
the MPEG-7 standard: Color Layout, Scalable Color, 
Color Structure, Edge Histogram, Color Tempera¬ 
ture, and Texture Browsing - explained in Section 
|3.2| The evaluation functions module (the EF mod¬ 
ule) is also prepared for general implementations; for 
this work, we implemented six functions: City-Block, 
Euclidean, Chebyshev, Jeffrey Divergence, Kullback- 
Leibler Divergence, and Canberra. The feature ex¬ 
tractors methods and evaluation functions acronyms 
are listed in Tables |2] and O 


The architecture also has an Instance-Based 
Learning module (the IBL module), which 
classifies images labeling them as in classes 
{fire, not fire}. The IBL module receives as 
input the unclassified images, one image descriptor 
(a pair of feature extractor and evaluation function) 
and the set of past cases correctly labeled. This 
module is assisted by a similarity retrieval subsystem, 
which executes the kNN queries necessary for the 
instance-based learning. 

We assume a fiow of images is feed to Fast-Fire 
Detection architecture. As each image arrives, it is 
stored in the RDBMS along with the corresponding 
vectors of the features extracted. Then, the IBL mod¬ 
ule classifies each unlabeled image based on the re¬ 
quired descriptor. 

This architecture was implemented as an API in¬ 
tegrated to an RDBMS, in which the user can create 
his/her own image descriptor by combining FEM and 
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5 EXPERIMENTS 


5.2 Precision-Recall 


DF to perform image classification. The user employs 
an SQL extension as the front-end for the architecture. 
The extension is able to execute similarity retrieval 
(through the Image Retrieval Module) and classifica¬ 
tion. 


5 EXPERIMENTS 

In this section, we search the combination of classi¬ 
fiers and image descriptors that are the most suitable 
to FFireDt in the fire detection task. We evaluate the 
impact of the image descriptors creating a candidate 
set of 36 descriptors given by the combination of the 6 
feature extractors with the 6 evaluation functions con¬ 
sidered in this work executing the IBl classifier over 
theFlickr-Fire dataset. The experiments were per¬ 
formed using the following procedure: 

1. Calculate the F-measure metric to evaluate the ef¬ 
ficacy of the experimental setting; 

2. Select the top-six image descriptors according to 
the F-measure to generate Precision-Recall plots, 
bringing more details about the behavior of the 
techniques; 

3. Validate our partial findings using Principal Com¬ 
ponent Analysis to plot the feature vectors of the 
extractors; 

4. Employed the top-three image descriptors accord¬ 
ing to the previous measures to perform a ROC 
curve evaluation, providing the analysis about the 
most accurate FFireDt setting; 

5. Evaluate the efficiency of the proposed FFireDt 
architecture, measuring the wall-clock time con¬ 
sidering the multiple configurations of the de¬ 
scriptors. 

5.1 Obtaining the F-measure 

To determine the most suitable FFireDt setting, we 
employed the F-measure, which relies on measuring 
the number of true positives (TP), false positives (FP) 
and false negatives (FN). The TP are the images con¬ 
taining fire which are correctly labeled as ’fire’, while 
the FN are those labeled as ’not fire’ although being 
fire images. The FN are the images labeled as ’fire’ 
but containing no traces of fire. The F-measure is 
given by2*rP/(2*rP + FP + FN ). 

We calculated the F-measure for the 36 image de¬ 
scriptors using 10-fold cross validation. That is, for 
each round of evaluation, we used one tenth of the 
dataset to train the IBl classifier and the remaining 
data for tests. It is performed 10 times and them the 
average F-measure is calculated. 


Table presents the F-measure values for all the 
36 combinations of feature extractor/evaluation func¬ 
tion. The highest values obtained for each row are 
highlighted in bold. The experiment revealed that dis¬ 
tinct descriptor combinations impact on fire detection. 
More specifically, the accuracy of extractors based on 
color is better than that of the extractors based on tex¬ 
ture (Edge Histogram, and Texture Browsing). More¬ 
over, the extractors Color Layout and Color Struc¬ 
ture have shown the best efficacy for fire detection, 
in combination respectively with the evaluation func¬ 
tions Euclidean and Jeffrey Divergence. 

The highlighted values are pointed out as the best 
setting for tuning FFireDt. In addition, notice that the 
best descriptor achieved an accuracy of 85%, which is 
close to the human labeling process, whose accuracy 
was 92.8%. 

Table 4: F-Measure for each pair of feature extractor 
method (rows) versus evaluation function (columns). For 
each feature extractor, the evaluation function with the high¬ 
est F-Measure is highlighted. 


FEM 

Evaluation Functions 

CB EU CH CA KU JF 

CL 

SC 

cs 

CT 

0.834 0.847 0.807 0.828 0.803 0.844 

0.843 0.827 0.811 0.835 0.671 0.798 

0.853 0.849 0.821 0.848 0.746 0.866 

0.799 0.798 0.798 0.800 0.734 0.799 

EH 

TB 

0.808 0.806 0.795 0.806 0.462 0.815 
0.766 0.762 0.745 0.751 0.571 0.755 


We also compared the best combination achieved 
by the IB 1 classifier, as reported in Table 4, with other 
classifiers. This was performed to assure that the 
instance-based learning (the FFireDt approach) is the 
most adequate classification strategy . In these exper¬ 
iments, we tuned FFireDt to employ the best EF for 
each FEM, as reported in Table We also grouped 
the results according to the employed FEM. 

Table shows the FFireDt results compared 
to Naive-Bayes, J48, and RandomEorest classifiers. 
The results show that FFireDt achieved the best E- 
Measure in every but one, of the classification config¬ 
urations. Random Eorest classification using the Scal¬ 
able Color extractor beat FFireDt, although by a nar¬ 
row E-Measure margin. Thus, we can conclude that 
IB 1 is adequate to fulfill the classification purpose on 
FFireDt. 

5.2 Precision-Recall 


We extended the analysis of Section 5.1 


measur¬ 


ing the Precision and Recall of each configuration, 
which are values also employed on E-measure. Such 
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5.3 Visualization of feature extractors 


5 EXPERIMENTS 



-1 -0.5 0 0.5 1 -2 0 2 4 -1 -0.5 0 0.5 1 

1st Principal Component 1st Principal Component 1st Principal Component 

Figure 10: PCA projection of fire and not-fire images: (a) Color Layout, (b) Color Structure, (c) Scalable Color and (d) Edge 
Histogram. The Color Layout visually separates the dataset into two clusters. 


Table 5: FFireDt obtained the highest F-Measure for all 
but one FEM when compared to other classifiers. For each 
feature extractor, we highlighted strategy with the highest 
F-Measure. 




Classifiers 


FEM 

FFireDt 

Naive- 

Bayes 

J48 

Random 

Forest 

CL 

0.847 

0.787 

0.751 

0.829 

SC 

0.843 

0.808 

0.845 

0.864 

cs 

0.866 

0.406 

0.842 

0.866 

CT 

0.800 

0.341 

0.800 

0.774 

EH 

0.815 

0.522 

0.711 

0.787 

TB 

0.766 

0.476 

0.706 

0.723 


further analysis permits to better understand the be¬ 
havior of the image descriptors and more specifically, 
the behavior of the feature extractors. A Precision vs. 
Recall (PxR) curve is suitable to measure the number 
of relevant images regarding the number of retrieved 
elements. We used Precision vs. Recall as a comple¬ 
mentary measure to determine the potential of each 
image descriptor in the FFireDt setting. A rule of 
thumb on reading PxR curves is: the closer to the 
top the better the result is. Accordingly, we consider 



0 0.5 1 

Recall 


Figure 11: Precision V5. Recall graphs for each of the 
most precise image descriptor combinations according to 
F-measure. 


only the more efficient combination of each feature 
extractor, as highlighted in Table IDi <CS, JF>, 
ID 2 <CL, EU>, ID 3 <SC, CB>, ID 4 <EH, JE>, 
ID 5 <CT, CA> and/D 6 <TB, CB>. EigurepHjcon- 
firms that the image descriptors IDi , ID 2 , and /D 3 are 
in fact the most effective combinations for fire detec¬ 
tion. It also shows that, for those three descriptors, 
the precision is at least 0.8 for a recall of up to 0.5, 
dropping almost linearly with a small slope, which 
can be considered acceptable. This observation rein¬ 
forces the findings of the E-measure metric, indicating 
that the behavior of the descriptors are homogeneous 
and well-suited for the task of retrieval and, conse¬ 
quently, for classification purposes. 

5.3 Visualization of feature extractors 

Based on the results shown so far, we hypothesize that 
the Color Structure, Color Layout, and Scalable Color 
extractors are the most adequate to act as FFireDt set¬ 
ting. In this section, we look for further evidence, us¬ 
ing visualization techniques to better understand the 
feature space of the extractors using Principal Com¬ 
ponent Analysis (PCA). The PCA analysis takes as 
input the extracted features, which may have several 
dimensions according to the EEM domain, and re¬ 
duces them to two features. Such reduction allows 
us to visualize the data as a scatter-plot. 

Our hypothesis shall gain more credibility if the 
corresponding visualizations allow seeing a best sep¬ 
aration of the classes fire and not-fire, in comparison 
to the other three extractors. Eigures [^a) - p^d) 
allow visualizing the two-dimensional projection of 
the data, plotting the two principal components of the 
PCA processing of the space generated by each ex¬ 
tractor. Eigure[T^a) depicts the representation of the 
data space generated by CL, the extractor that pre¬ 
sented the better separability in the classification pro¬ 
cess. The two clusters can be seen as two well-formed 
clouds with a reasonably small overlapping, splitting 
the images as containing fire or not. 

Eigure p^b) shows the data visualization of the 
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5 EXPERIMENTS 


5.5 Processing Time and Scalability 


space generated by CS, which was the FEM that 
obtained the highest F-measure on previous experi¬ 
ments. The data projection shows that each cluster 
forms a cloud clearly identifiable, having the centers 
of the clouds distinctly separated. However, this fig¬ 
ure reveals that there is a large overlap between the 
two classes. Figure [T^c) presents the projection of 
the space generated by the SC extractor. Again, it can 
be seen that there are two clusters, but with an even 
larger overlap between them. Visually, the CL outper¬ 
formed the other color FFMS: the CS and SC, when 
drawing the border between the two classes. 


Figure [^d) depicts the visualization generated 
by the EH extractor. It can be seen that indeed it 
has two clouds: one almost vertical to the left and 
another along the “diagonal” of the figure. However, 
the two clouds are not related to the existence of fire, 
as the elements of both clusters are distributed over 
both clouds. Figures (a), (b) and (c) show 

the visualization of the extractors based on color. The 
four visualizations show that the corresponding CL, 
SC and SC indeed generate clusters. However, there 
are increasing larger overlaps between fire and not- 
fire instances. Regarding the TB and CT features, the 
PC A projection was not able to separate the fire and 
not-fire classes. Concluding, the visualization of the 
feature spaces shows that extractors based on color 
are able to separate the data into visual clouds related 
to the expected clusters. Particularly, Color Layout 
has shown the best visualization, followed by Color 
Structure, and Scalable Color, which also have shown 
to significantly separate the classes. However, the ex¬ 
tractors based on texture identify characteristics that 
are not related to fire, thus presenting the worst sepa¬ 
rability, as expected. 


5.4 ROC curves 


Finally, we employed one last accuracy measure 
to define the FFireDt setting: the ROC curve. It al¬ 
low us to determine the experiments overall accuracy, 
using measures of sensitivity and specificity. Figure 


12 presents the detailed ROC curves for image de¬ 


scriptors ID\ =<CS, JF>, ID 2 =<CL, EU>, and 
ID 2 =<SC, CB>, the top three best combinations ac¬ 
cording to the F-Measure, Precision-Recall and Visu¬ 
alization experiments. For fire-detection, the area un¬ 
der the ROC curve was up to 0.93 for IDi ; up to 0.87 
for ID 2 ', and up to 0.85 for /D 3 . 

These results indicate that the top three image 
descriptors have similar and satisfactory accuracy. 
Therefore, the choice of which descriptor to use be¬ 
comes a matter of performance. In the next section 
we evaluate the performance to conclude what is the 



Figure 12: ROC curves for the top three image descriptors 
in the task of fire detection. 


most adequate descriptor. 

5.5 Processing Time and Scalability 

When monitoring images originated from social 
media, the time constraint is important because of 
the high rate at which new images arrive. Thus, 
we also evaluate the efficiency, given in wall-clock 
time, of the candidate image descriptors. We ran 
the experiments in a personal computer equipped 
with processor Intel Core i7 R 2.67 GHz with 4GB 
memory over operating system Ubuntu 14.04 LTS. 

Feature Extractors 



Figure 13: Plot Highest precision when classifying dataset 
Flickr-Fire vs. Average time to extract the features of 
one image for the six feature extractors. 


When monitoring images originated from social 
media, the time constraint is important because of the 
high rate at which new images may arrive. Thus, we 
also evaluate the efficiency, given in wall-clock time, 
of the candidate image descriptors. Figure shows 
the required average time to perform the feature ex¬ 
traction on FFireDt regarding Flickr-Fire dataset. 
Color Structure, the most precise extractor, was the 
second fastest. The second and third most precise 
extractors were Color Layout and Scalable Color: the 
former was three times slower than Color Structure, 
and the later was the fastest extractor. Thus, we are 
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6 CONCLUSIONS 


now able to state the that extractors Color Structure 
and Scalable Color are the best choices for fire detec¬ 
tion in image streams. Meanwhile, the texture-based 
extractors Edge Histogram, and Texture Browsing 
presented low performances, so they are definitely 
dismissed as possible choices. 

Evaluation Functions 

Figure shows the time required to perform 2 tril¬ 
lion evaluation calculations for each evaluation func¬ 
tion on feature vectors of 256 dimensions. The plot 
average precision wall-clock time shows that, al¬ 
though the Jeffrey Divergence demonstrated the high¬ 
est precision, it was the least efficient. In their turn, 
the City-Block and Euclidean distances presented ex¬ 
cellent performance and a precision only slightly be¬ 
low the Jeffrey Divergence. Therefore, we can say 
that they are the most adequate evaluation functions 
when performance is on concern, such as is the 
case in our problem domain. Finally, we conclude 
that the image descriptors given by the combinations 
{CS^SC} X {CB^EU} are the best options in terms 
of both efficacy (precision) and efficiency (wall-clock 
time). In Tablewe reproduce the F-measure results 
highlighting the most adequate combinations accord¬ 
ing to our findings. 



Figure 14: Plot precision when classifying dataset 

Flickr-Fire vs. Time to perform 2 trillion calculations 
for the six evaluation functions. 


Table 6: F-Measure for each pair of feature extractor 
method (rows) and evaluation function (columns), now 
highlighting the best combinations according to our experi¬ 
ments. 


FEM 

Evaluation Functions 

CB EU CH JF KU CA 

CL 

0.834 0.847 0.807 0.844 0.803 0.828 

cs 

0.853 0.849 0.821 0.866 0.746 0.848 

sc 

0.843 0.827 0.811 0.798 0.671 0.835 

EH 

0.808 0.806 0.795 0.815 0.462 0.806 

CT 

0.799 0.798 0.798 0.799 0.734 0.800 

TB 

0.766 0.762 0.745 0.755 0.571 0.751 


By these experiments, we point out that the best 


image descriptor for fire detection, considering the 
built dataset, is given by the combinations of the 
MPEG-7 Color Structure and Scalable Color extrac¬ 
tors with the distance functions City-Block and Eu¬ 
clidean. These combinations provide not only more 
efficacy, but also more efficiency. In general, we no¬ 
ticed that feature extractors based on color were more 
effective than extractors based on texture. We also 
identified that the Jeffrey divergence was the most ac¬ 
curate, however, it was also the most expensive eval¬ 
uation function. 


6 CONCLUSIONS 

We worked on the problem of identifying fire in 
social-media image sets in order to assist rescue ser¬ 
vices during emergency situations. The approach was 
based on an architecture for content-based image re¬ 
trieval and classification. Using this architecture, we 
compared the accuracy and performance (processing 
time) of image descriptors (pairs of feature extrac¬ 
tor and evaluation function) in the task of identify¬ 
ing fire in the images. As a ground-truth, we built a 
dataset with 2,000 human-annotated images obtained 
from website Flickr. Our contributions in this paper 
are summarized as follows: 

1. Dataset Flickr-Fire: we built a varied human- 
annotated dataset of real images suitable as 
ground-truth to foster the development of more 
precise techniques for automatic identification of 
fire; 

2. Fast-Fire Detection (FFireDt) Architecture: we 

designed and implemented a flexible, scalable, 
and accurate method for content-based image re¬ 
trieval and classification to be used as a model for 
future developments in the field; 

3. Evaluation of existing techniques: we compared 
36 combinations of MEPG-7 feature extractors 
with evaluation functions from the literature con¬ 
sidering their potential for accurate retrieval us¬ 
ing F-measure and Precision-Recall. As a result, 
we achieved 85% accuracy, precise classification 
(ROC curves), and efficient performance (wall- 
clock time). 

Our results showed that FFireDt was able to 
achieve a precision for the fire detection task, which 
is comparable to that of human annotators. We con¬ 
clude by stressing the importance of monitoring im¬ 
ages from social media, including situations in which 
decision making can benefit from precise and accurate 
information. However, to take advantage of them, it 
is required to have automated tools that can flag them 


44 


International Conference on Enterprise Information Systems, 2015 
Student Best Paper Award 
SCITEPRESS Copyright 










REFERENCES 


REFERENCES 


from the social media as soon as possible, and this 
work is an important step toward it. 
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